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Summary 


A p-version of the least-squares finite element method, based on the velocity-pressure- 
vorticity formulation, is developed for solving steady-state incompressible viscous flow 
problems. The resulting system of symmetric and positive definite linear equations can be 
solved satisfactorily with the conjugate gradient method. In conjunction with the use of 
rapid operator application which avoids the formation of either element or global matrices, 
it is possible to achieve a highly compact and efficient solution scheme for the incompress- 
ible Navier-Stokes equations. Numerical results are presented for two-dimensional flow 
over a backward-facing step. The effectiveness of simple outflow boundary conditions is 
also demonstrated. 



1. Introduction 


In recent years, considerable interest has been shown in the development of the p and the 
h-p versions of the finite element method[l-3]. The p- version achieves increased accuracy 
of the approximate solution by increasing the order of the polynomials that make up 
element shape functions rather than by reducing the mesh size as in the h-version which 
uses relatively low order polynomials within each element. If the increase in polynomial 
order and the reduction in element size is made simultaneously or selectively according to 
local error criteria, we obtain the h-p version. The h-p version can achieve a high level of 
accuracy with relatively few degrees of freedom. 

Although the h-version is very flexible in its ability to handle complex geometries, 
h-adaptive calculations are difficult to implement, since the data structure to handle mesh 
refinements is not trivial. The p-version of finite elements was designed in order to make 
adaptive computations easier than with the h-version. While it is possible to achieve 
excellent results with higher-order Lagrange type shape functions[4], there are significant 
simplifications in programming effort when Legendre type shape functions are used. A 
particular choice of functions uses linear combinations of Legendre polynomials leading to 
hierarchic shape functions. This yields relatively well conditioned problems and nested 
matrices[5]. If the underlying solution is smooth, the convergence of the p-version to the 
solution is rapid, so that it is possible to reach a given level of accuracy with lesser degrees 
of freedom than with the h-version. Because of these advantages, the p-version and the 
h-p version are becoming increasingly popular in a variety of applications, especially for 
problems in solid mechanics. 

The finite element method has been employed in the solution of problems in incom- 
pressible fluid dynamics for almost two decades. In principle, any finite element method 
for incompressible viscous flows, (a review of some recent developments in this area are 
contained in a survey paper by Gresho[6], and in a monograph by Gunzberger[7]), can be re- 
alized by the p-version. However, in practice. It Is not easy to employ standard approaches 
with the p-version of the finite element method. Commonly used finite element meth- 
ods discretize the equations of incompressible fluid flow by the classical Galerkin mixed 
method, leading to a system of algebraic equations. The matrix corresponding to this sys- 
tem of equations is nonsymmetric due to the convection terms in the momentum equations, 
and non-positive-definite owing to the uncoupled nature of the continuity equation. The 
solution of this class of equations by existing iterative methods is not satisfactorily robust 
for many problems of practical interest, and direct factorization is considered the only 
viable approach for solving such systems of equations. But for large 2D and 3D problems, 
the computer storage required by a direct method becomes prohibitively large. For the 
P'Version, this difficulty becomes more severe, since the element matrices of the p-version 
method are large and dense. 

For solving steady-state incompressible viscous flow problems, an alternate approach 
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is the least-squares method, based on minimization of the residuals of the first-order equa- 
tions written in the velocity-pressure- vorticity formulation[8-10]. This method can accom- 
modate equal-order elements. For the linear Stoke’s equations it has been shown that this 
method has an optimal rate of convergence for velocity, pressure and vorticity [11]. This 
method has the important advantage of yielding discrete equations that are symmetric and 
positive definite and can be solved effectively by iterative methods. The use of iterative 
methods, such as the conjugate gradient method, can reduce significantly the storage re- 
quirement and computing time, because they work by improving an initial guess, and do 
not require the formulation and factorization of a global matrix. Therefore, in this study 
we incorporate the least-squares method with the p-version finite elements. The paper is 
organized as follows. In Section 2 we describe briefly the least-squares method. In Section 
3 we explain the idea of rapid operator application which allows significant reductions in 
the computation time and storage requirements for the solution of large problems with the 
p-version of the finite elements. In Section 4, we present results for the 2D flow over a 
backward-facing step using simple outflow boundary condition. In the final section we end 
with some concluding remarks. 


2. The Least-Squares Method for the Incompressible 
Navier-Stokes Equations 


The 2D steady-state incompressible Navier-Stokes equations can be written in the 
velocity-pressure formulation as: 

du dv 


du du dp 1 .d^u d^u. . _ 

dv dv dp 1 .d^v d^v 


( 1 ) 


Here all variables are nondimensionalized; are the velocity components, p is the 

pressure, and Re denotes the Reynolds number. Appropriate boundary conditions should 
be supplied to ensure that this is a well-posed boundary value problem. 

The classic Galerkin approach to discretizing these equations leads to nonsymmetric, 
nonpositive definite equations. One way to avoid this difficulty is to use the least-squares 
method upon this system of equations. 

Let us examine the general least-squares approach. Consider a boundary value prob- 
lem: 


Lu = f in fi, 
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( 2 ) 


Bu = g on r, 

where L is a general partial differential operator, B is a boundary operator, u is an unknown 
vector, / is a given vector-valued function, and ^ is a given vector-valued function on the 
boundary. 

For an arbitrary trial function u which satisfies the boundary condition Bu ^ g on T ^ 
we define the residual function: R = Tju — /. The least-squares method minimizes the 
following functional: 


I{u) = f R^RdQ - / (L« - ff{Lu - f)dil. 

Jn Jn ~~ ~ 


( 3 ) 


Taking the variation of I with respect to u and setting ^7 = 0 and Su — leads to 
the least-squares weak statement: find u satisfying B« = ^ on F such that : 




(Lt^)^(Lu - f)dn = 0. 


( 4 ) 


The least-squares weak statement corresponds to the Galerkin statement with the test 
functions w being replaced by TiW, ^ _ 

The finite element discretization of the least-squares weak statement(4) yields sym- 
metric, positive definite equations; however the condition number is of the order of the 
square of that obtained by Galerkin discretization, which can lead to slow convergence, or 
lack of convergence. Furthermore, in this case the operator L contains second-order deriva- 
tives, it is then necessary to use continuous shape functions, which are very awkward 
to use. 

To avoid both of the above problems the least-squares method is applied to the Navier- 
Stokes equations recast as first-order partial differential equations: 

du dv 
dx dy 

du du dp 1 du^ 
dv dv dp 1 div 
du dv 


( 5 ) 
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In the above equations, the vorticity u; is introduced purely in order to express the Navier- 
Stokes equations as a system of first-order partial differential equations while introducing 
the minimum number of additional variables. Since the system(5) is of first-order, the 
boundary conditions are very simple, and do not involve the derivatives of unknowns. A 
crucially important point is that even though vorticity is introduced as a variable^ it is 
not required to specify the vorticity on a boundary for the first-order equations(S)^ if the 
boundary conditions for the original set of equations(l) are related to only the velocity 
components (u,v) and the pressure p, and not to their derivatives. 

While this approach may be used with the h-version of finite elements, or even the 
finite difference method, we combine the least-squares method with the p- version of finite 
elements. In the next section we discuss the implementation of the method. 


3. Iterative Solution with Rapid Operator Application 

We will show in this section that the use of well known iterative methods in conjunction 
with rapid operation techniques, leads to a solution scheme that can achieve significant 
savings in storage for the solution of problems with the p-version. We consider here the 
conjugate gradient approach which can be applied to our case, because the matrices are 
symmetric and positive definite. 

The major computational component in this method is the multiplication of the global 
matrix by a global vector. Since the global matrix is formed by assembling individual 
element matrices, the multiplication can be done without forming the global matrix. This 
alone can greatly reduce the storage requirements over direct methods. This is one of the 
components of the element-by-element approach[l2]. 

If the storage is to be minimal, the element matrices should not be stored but should 
be recomputed at every iteration. With the h-version this is not very expensive, and it is 
possible to reform the element matrices at every iteration without paying a high penalty. 
However, with the p-version, the reformulation of the element matrix at every iteration, 
(particularly for 3-D problems with high levels of polynomials), can be exorbitantly ex- 
pensive in solution time. Furthermore, even a single element matrix can take up a large 
amount of storage; e.g., for a 3-D structural analysis problem using polynomials of order 9, 
a single element matrix occupies about 36 Mbytes when the symmetric part of the matrix 
is stored in double precision. 

We now show that with the use of simple and well known techniques, it is possible 
to get the matrix vector product without forming and storing the element matrices thus 
making the storage requirements close to minimal. The rapid operator application (in 
effect, the matrix vector product) is performed with less operations than when an element 
matrix is formed, i,e,, there is no penalty in computing time. 
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The key ingredient is the use of elements that employ tensor-product shape functions. 
Since such shape functions implicitly perform separation of variables, it is possible to 
group terms so that several terms are reused in the rapid operator application. The savings 
associated with the use of tensor-product shape functions are best illustrated with a simple 
example. Consider a rectangular element within which we define shape functions. The 
variation of a quantity, e.g., u, within this element can be expressed in the form 

u = Ucc ’^a{x,y) (a = 1,6^), (6) 

where Ua are the coefficients and are the shape functions. (In the above equation, 

repeated indices imply summation.) When the shape functions are in tensor-product form, 
each two-dimensional shape function is expressed as a product of one-dimensional shape 
functions. Thus, the above expression can be rewritten as 


u = 4>i{x)(j>m{y)Uim (/,m = l,6). ( 7 ) 

To illustrate the evaluation of an operator with these shape fimctions, we consider the 
evaluation of the product of the mass matrix by a vector; i.e., we wish to evaluate for 
values of a = 1,6^, the expression 


where the mass matrix is given by 


Map — jj: y)^p{x, y) dx dy. 


We need for a = 1,6^ 


= L L ^a{x,y)^p{x^y)Updxdydz (fe — 1,6^). 

Using the tensor-product form, we expand the above expression to obtain 

Drs — J j 4>r{x)^a{y)<f>l{x)<l>m{y)Ulmdxdy (1,771 = 1,6), 


( 8 ) 

( 9 ) 


( 10 ) 


( 11 ) 


where r, s each assumes all values between 1 and b. When we perform numerical integration, 
we replace integrals by weighted sums of the functions evaluated at the integration points. 
For simplicity, we assume that the number of integration points is equal to the number of 
shape functions. (This is not necessary, but greatly simplifies the explanation.) Thus, the 
above expression reduces to 


Dr, = (i, jf, Z, m = 1 , 6), (12) 
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where are the integration points and w{i),w{j) are the weights. We note here that 

quantities such as (f>r (a:’) are merely the one dimensional shape functions evaluated at the 
integration points and are matrices which we denote by (f>ri- We therefore need to evaluate 
for all values of r, s from 1 to b, the expression 

Drs = <^>ri^sjW{i)w{j)4>u4>mjUlm {i, j ,l , TTl ^ 1 ,b). (13) 

This is done by contracting the indices from the right, one step at a time. Thus, the 
operation count in evaluating the product of the mass matrix and a vector is now reduced 
to order By comparison, if the mass matrix were to be formed, it would have required 
order (6^) storage. The multiplication by a vector of length P, would take 6^ operations. 

The key point is that to evaluate the matrix-vector products, the element matrices 
are never formed, but rather computed on the fly, with numerical quadrature implicit 
in the evaluation. The savings associated with tensor-product shape functions are well 
known in mathematics (the multidimensional Fourier expansions is a good example), and 
in computer aided design[l3,14]. In recent years such shape functions have been elegantly 
exploited by Karniadakis, Orszag and his associates in the spectral element method used 
for fluid flow computations [15]; similar results have been obtained in solid mechanics[16j. 

The above derivation does not take into account the specific nature of the shape func- 
than that they are in tensor product form. The shape functions for the p- version 
are chosen so that their derivatives are nearly orthogonal over a cube, and the evaluation of 
all Dra could be performed without numerical integration in order b^ operations. However, 
the intention above is to illustrate the approach that is taken with more general curved 
elements, where a Jacobian is present within the integral so that orthogonality is lost. 

The approach described above can be used to evaluate the operator corresponding to 
the 2-D incompressible Navier-Stokes equations. The unknowns are the two components 
of the velocity u, v, the vorticity a?, and the pressure p. If we consider a single element with 
b^ shape functions, the size of an element matrix is approximately 45^ *46^. Multiplication 
by a vector of length 45^ would take 165^ operations, if advantage is not taken of the tensor 
product nature of the shape functions. When we use rapid operator application with the 
number of Gauss points equal to the number of shape functions, it can be shown that the 
number of floating point operations required are approximately 

32 * 6^ -f Order(b^ flerms 

This estimate still holds when the number of integration points is different from the number 
of shape functions, except that the constant is no longer 32. The value 32 is also related 
to the number of terms in the partial differential equation, and the ways in which they can 
be grouped together; a simpler equation with a lesser number of terms would have a lower 
constant. This approach is valid for elements that are distorted and also for nonlinear 
problems. We note here that when required, the element matrix can be formed in order 
b operations, as compared to 6® operations with the standard approach. 
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The program KGNFEM is an experimental finite element code that incorporates these 
ideas and serves as a framework for both fluid flow and structural analysis computations. 
We present below some results showing that large scale fluid flow computations can be 
performed with very modest storage requirements. 


4. Problem Description and Results 

The approach described above has been tested by solving the problem of flow past a 
backward-facing step. The back-facing step example has been chosen so as to be able to 
compare the results with the benchmark solution of Gartling[17]. The step has a height of 
0.5 and the channel downstream of this step has a height of 1.0. The total length L of the 
channel is 30. 

The boundary conditions are as follows: At the inlet, a parabolic velocity profile is 
specified for u, with the tangential velocity v set to zero. At the exit the only outflow 
boundary condition is p = 0. Along the walls, the no slip condition is used. Note that 
there are no boundary conditions specified for vorticity. 

The Reynolds number Re = UavgHI^ was 800 based on the height 5^ = 1.0 of the 
channel, the average inlet velocity Uavg = 0.6667, and the adjusted viscosity u, 

A total of 20 uniform finite elements were used in the solution of this problem. The 
orders of the polynomials used for the variables u^v^w^p were 15, 15, 14, and 14 respec- 
tively, resulting in a total of 14607 unknowns requiring about 1.25 Mbytes to solve the 
problem. (The equal orders 15, 15, 15, 15 were also tested; we didn’t find any visible 
difference between these two results.) Roughly speaking, the required storage is about 11 
times of number of unknowns in double precision words. 

In this numerical experiment, a simple successive substitution scheme is used to obtain 
the solution. At the beginning we assume that u = v = 0- The required number of 
substitutions is 80 for the L 2 norm of the residual to be less than the tolerance 10”^. In 
each substitution step, the linear algebraic system is solved by the preconditioned conjugate 
gradient method. Here we use the simple Jacobi(dlagonal) preconditioner. 

The computed results (velocity contours, pressure contours, and vorticity contours) 
are shown in Figure 1. We note that there are no oscillations in the computed results. 
We cdso should mention that we didn’t use any postprocessing to smooth the computed 
results. The re-attachment length of the circulating zone immediately behind the step and 
the detachment and re-attachment points on the upper wall of the channel compare well 
with the Gartling’s resiilts[17]. 

In order to test the effectiveness of the outflow boundary condition, we cut the length 
of the backward-facing step problem to a half. We imposed the condition p 0 on the exit 
boundary. Although p = constant is not compatible with the real physics at that location, 
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the method still gives reasonably good solutions and captured the two recirculation zones 
(see Figure 2). 

These numerical results verify the validity of the least-squares method of the Navier- 
Stokes equations using the p-version of the finite elements. 


5. Conclusions 

The Navier-Stokes equations have traditionally been the prime generators of nonsym- 
metric systems of equations. The least-squares approach described in this paper shows 
that it is possible to generate symmetric, positive definite systems of equations which can 
be robustly solved by the conjugate gradient method. The least-squares method does not 
require the use of artificial viscosity or upwind differencing. This method allows extremely 
simple boundary conditions; It is a clean, robust and consistent approach to solve the 
Navier-Stokes equations in primitive variable form. In conjunction with the the p-version 
of finite elements, and the rapid operator application, this approach leads to mi nimal 
storage requirements even for large-scale problems. 
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(a) Velocity 



(b) Pressure 



Fig. 1. Computed results for back ward- facing step with X = 30 at Re = 800. 
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Fig. 2. Computed results for backward-facing step with L = 15 at Re = 800(continued). 
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(c) Vorticity 


Fig. 2. Computed results for backward-facing step with i = 15 at i2e = 800. 
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